Query Expansion for Noisy Legal Documents

نویسندگان

  • Lidan Wang
  • Douglas W. Oard
چکیده

The vocabulary of the TREC Legal OCR collection is noisy and huge. Standard techniques for improving retrieval performance such as content-based query expansion are ineffective for such document collection. In our work, we focused on exploiting metadata using blind relevance feedback, iterative improvement from the reference Boolean run, and the effects of using terms from different topic fields for automatic query formulation. This paper describes our methodologies and results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections

To bridge the vocabulary gap between the user’s information need and documents in a specific user generated content environment, the blogosphere, we apply a form of query expansion, i.e., adding and reweighing query terms. Since the blogosphere is noisy, query expansion on the collection itself is rarely effective but external, edited collections are more suitable. We propose a generative model...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

A Cross-language Information Retrieval Based on an Arabic Ontology in the Legal Domain

In this paper, we describe a web-based multilingual tool for Arabic information retrieval based on ontology in the legal domain. We illustrate the manual construction of the ontology and the way it is edited using Protégé2000. Using Arabic (UN) documents we identify the legal terms and the semantic relations between them before mapping them onto their position in the ontology. The process of se...

متن کامل

On the importance of Legal Catchphrases in Precedence Retrieval

This paper presents our working notes for FIRE 2017, Information Retrieval from Legal documents -Task 2 (Precedence retrieval). Common Law Systems around the world recognize the importance of precedence in Law. In making decisions, Judges are obliged to consult prior cases that had already been decided to ensure that there is no divergence in treatment of similar situations in different cases. ...

متن کامل

Interactive Query Refinement for Boolean Search

Boolean search is still the method of choice for many kinds of professional search, such as constructing systematic reviews in legal and medical fields. It is effective for fast, high-recall document classification. Its drawback is the difficulty in crafting a Boolean query that captures semantically relevant documents. Ambiguous search terms lead to the inclusion of non-relevant documents. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008